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Abstract 



Scaling of folding times in Go models of proteins and of decoy 
structures with the Lennard-Jones potentials in the native contacts 
reveal power law trends when studied under optimal folding condi- 
tions. The power law exponent depends on the type of native ge- 
ometry. Its value indicates lack of kinetic optimality in the model 
proteins. In proteins, mechanical and thermodynamic stabilities 
are correlated. 



INTRODUCTION 

Proteins are extraordinary heteropolymers. They fold to their native states much faster 
than what a blind combinatorics would predict^ since a folding funnel in the energy land- 
scape is formedJUil Proteins are believed to have high designabilities,! to be stable against 
mutationsjli and to have the highest densities of statesJa Furthermore, the a helix secondary 
motifs have been shown theoretically to be the fastest folders among chains of the same num- 
ber, N, of aminoaci dl and be the result of the geometrical optimization of compact chains 
with maximum wiggle room.Ej Experimental results&ill (see also a commentary by ChanB) 
also point to the accelerating role of the helices. Biological evolution may have optimized 
functionality of proteins, but can proteins be optimal kinetically? 

Here, we consider folding times, £/ d m, in Go modelsM of proteins and decoy structures 
and show that though proteins fold to their native structures fast they are not optimal 
folders. This conclusion ties well with the protein engineering experimentsBli! which show 
that mutations in wild type proteins may lead to significant increases in folding rates and 
thus show no kinetic optimality of sequences. Our theoretical argument is based on relating 
universality classes in the scaling of t/ D ^ to classes of native geometries. This confirms a 
decisive role of native geometry in determining properties of proteins.0 The scaling trends 
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that we observe are robust when studied at the temperature of the fastest folding, T min , but 
become obscure when studied at other temperatures. 

Another issue which we examine here regards the notion of protein stability. One defi- 
nition of stability is thermodynamic - it assesses the role of non-native phase space valleys 
relative to the native valley by determining the probability of staying in the native basin. 
It is characterized by the folding temperature, Tf, at which this probability crosses |@ 
Another is mechanical: at what temperature will the native conformation melt due to vi- 
brations? The mechanical definition does not refer to non-native valleys. The two notions 
should correlate with each other if the native valley dominates in the energy landscape. We 
show that this is indeed what happens in model proteins. 



MODEL AND METHOD 

We first consider the problem of universality classes in the scaling of folding properties. 
There have been various predictions about the nature of scaling of tj D ^. A number of theories 
suggested a power law dependence of barrier heights on N and thus an exponential law for 
tj oW .EHEI Thirumalai,@ however, has argued in favor of a power law for tf Q i d , 

t f oid~N x , (1) 

where A is estimated to be between 3.8 and 4.2 for simple two-state folders. A heuristic 
modelil leads to A = 3. Numerical studies of tf oid in various lattice mo delsM have 
supported the power law behavior and indicated dependence of A on specifics of the model, 
dimensionality and temperature, T. For designed sequences in three dimensions, A has been 
found to be in the Thirumalai ranged whereas for Go models it has been found to be of 
order 3.0@ In these studies, tf id is defined as the first passage time. 

Here, we extend the scaling studies to off-lattice Go models^! and consider chains of of 
beads separated by do ~ 3.8A - a typical length of the peptide bond. The Go Hamiltonian 
is defined through a native conformation of a sequence since it assigns relevant interaction 



energies only to the native contacts. Despite this simplification, the Go models may behave 
more realistically than atomistic models.^ 

It should be noted that the Go models are so minimal that they disregard an explicit 
amino acidic definition of a protein and variablity of the volume taken by individual side 
chains. Natural proteins appear to fold by locking its segments together in an unfrustrated 
way. Adding attraction to the non-native contacts in the bead-spring model might seem to 
be making the model more realistic but, in fact, it leads to spurious entanglements during 
folding. In this sense, the Go model repairs some of its shortcomings by a mutual cancellation 
of its ills and focuses on the effects related to the native structure. This focus is justified 
by experimental indications that the native structure itself is central to folding. 00 On the 
other hand, the target oriented aspects of such theoretical modeling are hard to justify 
on a fundamental level.il The nature of the Go model allows one to study the role of the 
native structure in kinetics but it does not allow to address the role of the sequential order. 
Determining sequence based, as opposed to structure based, classes of kinetic universality 
would be much more interesting but, clearly, also much more challenging. 

for the native con- 
tact interactions between monomers i and j, in a distance of apart. The non-native 
interactions are described by repulsive soft core potentials that provide excluded volume 
and prevent entanglements. Our approach is presented in details in Ref. where several 
secondary structures and three model proteins have been analyzed. Such models were also 
studied in Ref.HH The distances between successive beads are controlled by an anharmonic 
potential. The length parameters cry are selected so that the minimum of corresponds 
to the geometry found in the target structure and the contacts are said to be formed when 
i and j are not consecutive along the chain and is less than d na t, where d na t is 7. 5 A. 

There are other variants of the off-lattice Go models: Zhou and KarplusS and Dokholyan 
et al@ have considered models with a square well potential. Clementi et al.0 have studied 
the 12-10 power law potentials. It is not clear which effective potential is the best and our 
choice is LJ. 



The dynamics of the system are described by the Langevin equation mi = — 71- + F c + F 
where r is a position of a monomer, m is its mass and F c is the force derived from the 
Hamiltonian. 7 is a friction coefficient and T is the random force such that (r(O)r(t)) = 
2 / ykBT5(t) , where ks is the Boltzmann constant, t is time and 5(t) is the Dirac delta 
function. Both the friction and the random force represent the effects of the solvent and 
they control T. The equations are solved using the fifth order predictor-corrector scheme. 

In the following, T is measured in the units of e/kg and t is measured in units of the 
oscillatory period r. At low values of friction, r is equal to y / ma 2 /e, where a is a van der 
Waals radius of the amino acid residues. The value of a is chosen as 5A which is roughly 
equal to (cr^-) in our model proteins. The simulations are done with 7 = 2mr _1 - a standard 
choice in studies of liquids. Higher values of 7 have been argued to be more realisticll We 
have shownlll that tj D ^ is linear in 7 and T min depends on 7 weakly. 

The native conformation is defined through the locations of the a carbons. We have 
considered 21 single domain Protein Data Bank (PDB) structures^ with N ranging between 
29 and 98. 9 of these structures belong to a set of proteins considered by Plaxco et al.0 
or are their close homologies. These are: the SH3 domain of lefn (57), 2ptl (63), 2ci2 (83- 
18=65; 18 are not resolved), lcsp (67), lubq (76), lhdn (85), 2abd (86), lten (90), and laps 
(98), where the numbers in brackets indicate the corresponding values of N. The additional 
12 structures are: lcti (29), lcmr (31), lce4 (35), lbba(36), lerc (40), lcrn (46), 7rxn (52), 
5pti (58), ltap(60), laho (64), lptx (64), lerg (70). These conformations were picked from 
the low-iV end of the size distribution to allow for a reliable characterization. Our studies of 
these structures indicate well defined overall trends in tf Q id which are only weakly affected 
by an inclusion of steric constraints.^ Our results will be given here only for models without 
such constraints. 

The results obtained for the PDB structures are compared to five classes of decoy con- 
formations which differ in the way they fill space and in their packing arrangements. These 
classes form statistical ensembles in which a given value of N has multiple realizations. Four 
classes are defined in terms of shapes that homopolymers arrive at under various cooling 



procedures. The non-consecutive beads in the homopolymers interact through the LJ po- 
tential with <7jj = 5A which corresponds to a typical van der Waals radius of aminoacids. 
We discuss the following classes (see Figure 1): 

HC: conformations obtained through slow homopolymer cooling. The procedure involves 
generating an open conformation, assigning identical strengths to all inter-bead inter- 
actions, and then slowly annealing. The resulting compact conformation serves as a 
native structure in the Go-like Hamiltonian. 

HQ: similar to HC but with a rapid quenching instead of annealing. The procedure results 
in non-compact native structures which, however, have many local contacts, as mea- 
sured along the chain, and are thus more closely related to a-helices than to random 
heteropolymers. 

HA: similar to HC but the a-helices of various lengths (of order 15) are first built into the 
initial states in a way which is consistent with the LJ couplings and then preserved 
through the annealing process by assigning ten times stronger couplings to the helical 
secondary structures. 

HB: similar to HA but the helical segments are replaced by /3-sheet conformations. The 
lengths of the /3-strands are fixed at 8 monomers. 

CL: compact native conformations generated on a grid as a self- avoiding random walk 
within a compact box of lattice constant equal to the length of a peptide bond and 
then stabilized by appropriate Lennard- Jones interactions. 

The folding properties are studied as a function of T and then presented here for T=T min 
and Tf. at which probability, P , of being in the native basin is \. P is determined based 
on 10-15 long molecular dynamics trajectories at equilibrium. The results are illustrated in 
Figure 2 for two model proteins lubq and lce4. 



The median folding times depend on T in a U-shaped fashion and, generally, the bigger 
the N, the narrower the U. The dependence of tf u for the Go models of lubq and lce4 Is 
shown at the bottomn of Figure 2. 

The system is assumed to be in its native state if all of its native contacts are established. 
A native contact between monomers i and j is said to be established if is shorter than 
1.5cr ? ;,-. 



RESULTS AND DISCUSSION 

Kinetic Universality Classes 

Figure 3 shows the validity of the power law, eq. (1), for tf a i d when determined under 
the most favorable kinetic conditions - at T m j n . The exponent A sensitively depends on 
the geometrical class of the native structures (Table I). The case of HB is special since a 
crossover between two effective values of A is observed (the /3-strand lengths of 8 impose a 
condition on N above which a characteristic /3-sheet behavior can start to be seen). The 
values of A range between 1.7 and 3.2. The smallest A corresponds to the HA and the largest 
to the HQ and long HB conformations. HC is intermediate. Note that A for HA is smaller 
than 2 - the value suggested by de Gennes0 in his analysis of the time scale for the coil to 
globule transition of a homopolymer. 

The data points for PDB at T m i n are somewhat scattered - there is no averaging over 
an ensemble - but a well defined trend is visible. The exponent A is about 2.5 ± 0.2 which 
indicates that these structures are not optimal kinetically. HA, short HB, and HC of the 
same N fold faster. PDB appears to be comparable to the grid conformations CL (there 
is only a week dependence on the dimensionality in the off-lattice models - when the grid 
structures are generated on the square lattice, A becomes equal to 2.1 ± 0.2). 

The existence of a trend in the scaling of tf a id for the PDB structures appears to be at 
odds with the analysis of experimental data compiled by Plaxco et al.,0 and replotted here 
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in Figure 4, which indicated lack of any correlations with N. A flavor of this is already seen 
in Figure 3 which shows that one sequence (laps) has a tf id which is distant from the trend. 
This sequence appears to be frustrated geometrically and it has a very small experimental 
folding rate@ - maybe it is just a poor folder. However, the experimental data indicate a 
significantly larger scatter of values than as seen in Figure 3. 

There are three explanations of the discrepancy that we considered. First: the range of 
the values of N considered in Ref.0 is smaller than studied here, which in itself emphasizes 
fluctuations. However, in data that were published laterllH the range of iV was extended 
to about 150 and the correlations of kinetics with N remained weak so the limited range 
of the values of N is not a likely explanation of the discrepancy. Second, it is only the 
simplified models, like the Go models, that show trends in the kinetics of folding whereas any 
additional complexities present in real systems may perturb such trends beyond detection. 
This possibility could be studied in the future by considering scaling in more complicated 
classes of models. In particular, the role of the localization index of the interactions should 
be elucidated in the context of scaling. Third: the trends are derailed by the fact that the 
experimental data are usually obtained at a fixed temperature, typically, but not necessarily, 
at the room temperature. Thus the data collection involved no kinetic optimization which 
would require selecting the best T for each protein individually. 

The role of this third possibility is illustrated at the top of Figure 4 which shows the 
scaling of tfou at Tf. The scatter is seen to be significantly larger than at T m j n . It is not as 
large as in the experimental data but it should be noticed, again, that our Go systems are 
just very simple models of systems which are quite complicated. Another way to asses the 
relevance of the optimal selection of T is shown in Figure 5 which reanalyzes the data of 
figures 3 and 4 so that theoretically determined tj id is plotted against the experimentally 
measured folding time, t exp . The small number of available points makes it hard to place 
a bet on the best trend. However, it is clear that the points determined at T min exhibit 
significantly less scatter than those calculated at Tf. This finding gives further support 
for the idea that the lack of optimization in the temperature may mask existence of any 
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underlying trends. 

Our findings on scaling of characteristic T's can be summarized as follows. For HC, HA 
and CL, T min grows with N whereas Tf is almost constant and somewhat lower than T min . 
The difference between T min and Tf grows most slowly for HA. For PDB, T min does not seem 
to have a trend, within the range of N studied, and the values of T m j n are usually just above 
the corresponding values of Tf. This indicates a borderline behavior between excellent and 
poor folding characteristics, if the condition for the latter is Tf « T min .0'0 This borderline 
behavior might characterize classes of proteins, especially of those which have a short lifetime 
in a living cell but this result may depend on the choice of the potentials. 

Stability Against Vibrations 

We now discuss stability of the native state in proteins. The mechanical stability can 
be probed through the phononic spectra as in Refill. This is accomplished by determining 
the frequency gap, u 1 in the low end of the frequency spectrum. Another test is provided 
by studying root mean square displacements around the native state and employing the 
Lindemann criterion for melting. We introduce the parameter 



which is a variant of the parameter used by Takano et al.e3 The summation is over pairs of 
monomers (n of them) which form native contacts and their rms displacement is compared 
to the native distances. The temperature, Tl, at which Sl crosses the Lindemann value of 0.1 
is a measure of mechanical stability. Figure 6 shows that both U\ and Tl show a correlation 
with Tf which suggests the predominance of the native valley in the energy landscape. Note 
that Tl is higher than Tf which indicates that the probability "leaks out" of the native 
state already when the vibrations in the native valley are small. Thus for good folders, the 
notions of thermodynamic and mechanical stabilities qualitatively coincide. 
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CONCLUSIONS 



Our main results on scaling can be summarized as follows. There are kinetic universality 
classes among well folding sequences. These classes depend on the type of geometry involved 
in the native state. Well defined scaling trends can be established if folding is studied under 
optimal conditions. Otherwise they are hard to be seen, especially if the range of system 
sizes is narrow. The shapes of actual proteins in their native states are such that the folding 
times scale with an exponent which is higher than certain artificial classes of structures. 
This suggests the lack of kinetic optimality of proteins. 

Our results have been obtained within the Go model which focuses on the role of the 
native state geometry. This level of simplified description incorporates a long list of ap- 
proximations which somehow appear to compensate mutually. In effect, the Go systems are 
reasonable models of good folders and the simplifications involved are precisely of the kind 
that allow for a statistical analysis necessary to establish scaling properties. Working with 
sequences described in a more sophisticated manner would add to the reality of description. 
However, it would also necessitate dealing with statistical ensembles of sequences defined by 
more parameters than just the size and shape and currently that would be prohibitive nu- 
merically. Our results should then be viewed as establishing first inroads into understanding 
of the role of size in folding kinetics. 
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TABLES 

TABLE I. The values of exponent A for the classes of conformations studied. 



Structure A 

HC 2.2 ±0.1 

HA 1.7 ±0.1 
HB 0.9 ±0.1, 3.2 ±0.1 

HQ 2.7 ±0.2 

CL 2.6 ±0.2 

PDB 2.5 ±0.2 
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FIGURE CAPTIONS 



Fig. 1. Examples of native conformations that were used in these studies. The folding data 
were generated based on 11 realizations of each class of structures for each value of N, 
except for the case of HA when 5 realizations were sufficient. 

Fig. 2. Equilibrium and kinetic properties of the Lennard- Jones-Go model of two proteins, 
lce4 (solid lines) and lubq (dotted lines), as a function of temperature. The top panel 
shows the probability of staying in the native state and the bottom panel shows the 
median folding time as determined based on 200 trajectories at each temperature. 

Fig. 3. Power law dependence of the median folding times at T m j n on N for the indicated 
classes of geometry of the native conformations. The proteins analyzed by Plaxco et 
al.0 are shown as open squares. Closed squares correspond to other proteins. For CL, 
the times are multiplied by 10. 

Fig. 4. Top: Values of tf Q id for the PDB structures as determined at Tf. The open circles 
refer to the proteins studied by Plaxco et al.0 The line shows the scaling trend found at 
T min . Bottom: Experimentally determined folding times, t exp , (inverses of the folding 
rates) as compiled by Plaxco et al.0 

Fig. 5 Folding times of the theoretical Go model versus folding times observed experimen- 
tally in proteins studied by Plaxco et al.tl. The solid and open squares correspond to 
T = T min and T = Tf respectively. 

Fig. 6. Bottom: Tl as a function of Tf for the PDB structures studied. Top: The lowest 
non-zero phononic frequency of the same structures plotted vs. Tf. The broken lines 
indicate overall trends. The inset illustrates the dependence of 5l (eq. 2) on temper- 
ature, T, for the Go model of crambin. The horizontal line indicates the 10% value of 
5 L . 
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